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PROGRESSIVE ADAPTIVE TIME STAMP 



RESOLUTION IN MULTIMEDIA AUTHORING 



CROSS-REFERENCE TO RELATED APPLICATION 



This application is continuation-in-part of provisional patent 
application Serial No. 60/106,764 filed November 3, 1998, the benefit of the 
filing date of which is hereby claimed for the commonly disclosed subject 
matter. 



The present invention generally relates to composing and playing 
multimedia presentations and, more particularly, to a flexible time stamp 
information carried in the stream descriptor of the multimedia presentation. 



Multimedia authoring systems exist that allow the user (i.e., the 
author) to insert multimedia objects, such as video, audio, still pictures, and 
graphics, into a multimedia presentation at a certain spatial position and with a 
certain temporal location. Such an authoring system is used typically to create 
presentations that are in an MPEG-4 (Motion Picture Experts Group, version 
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4) or SMIL (Synchronized Multimedia Integration Language) format. 

In more advanced authoring systems, the temporal location of the 
multimedia objects need not be absolute in time, but can be defined relative to 
other multimedia objects. This means that, for example, a video clip can be 
5 authored to start at the same time that a specific audio clip starts. Another such 

example is that after completely playing a certain video clip, another video 
clip should be played, possibly with some delay. The essence of this is that 
multimedia objects have start and end times that are defined with respect to 
the start and end times of other multimedia objects, with possible temporal 

10 offsets (delays). 

A fiirther feature of advanced temporal authoring of multimedia 
objects is the possibility to have a range in duration of multimedia objects. For 
example, a certain video clip has a certain duration when played at the speed 
at which it was captured, say thirty firames per second. This now allows 

1 5 authors to define a range in the playback speed, for example between fifteen 

firames per second (slow motion by a factor of two) and sixty frames per 
second (fast play by a factor of two). This results in respectively a maximum 
and minimxim total playback duration. In general, the advanced authoring 
systems allow authors to specify such ranges in multimedia object playback 

20 duration. Note, that it is still possible to dictate only one specific playback 
duration (which is directly related to the playback speed in the case of video, 
audio, or animation) by restricting the duration range to a zero width. 

If we now combine the relative start and end times of multimedia 
objects in the authoring system with the possibility to also specify a duration 

25 range, we see that a complete authored multimedia presentation is a complex 
but flexible system of interconnected objects with variable durations. The 
advantage of having this flexibility in duration lies in the data transmission 
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and playback of multimedia objects. By not having very strict multimedia start 
and end times, the system has some flexibility to adapt to data delivery 
problems, which may be due to network congestion or transmission errors. For 
the final delivery and playback the system (which may be the server or the 
client) will resolve the true multimedia object start and end times during 
transmission and playback adaptive to the environment. 

In general, with these variable object durations, many actual values for 
start and end time are possible for all of the multimedia objects, especially 
when no delivery problems occur. In actual playback, absolute time stamps 
must be used. That means that for every multimedia object a playback 
duration is chosen which lies within the range of its possible durations. The 
problem of determining these factual durations at run time (i.e., playback) is 
addressed here. The method will be progressive in time; that is, it resolves the 
absolute time stamps as time advances, making it adaptive to the changing 
environment. Finally, it must be defined what information is to be sent to a 
cUent, that is sufficient to do the time stamp resolution. 

SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide a technique 
for determining the factual durations of multimedia objects at run time 

It is another object of the invention to provide a new dedicated 
descriptor of object time duration to alleviate the problem of unreliable 
delivery of objects in a multimedia presentation. 

According to the invention, the solution to the problem consists of two 
parts. First, it is necessary to define what information must be available to the 
cUent in order to be able to determine the multimedia object durations. And 
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second, the resolution of the durations themselves must be solved. The new 
flexible timing information can be used by the client to adapt the timing of the 
ongoing presentation to the environment, while having more room to stay 
within the presentation author's intent and expectations. 

Six steps are used to resolve the actual label time, and the 
corresponding duration of the multimedia objects that have that label for their 
respective end times. In the first step, all the dependency relations are 
collected for the label Px, by taking all objects n that have Px as the label for 
their end time: 

+ minimxmi(/j) ^ ^ + maximimi(7i) « = 1, . . ., . 
Here is the start time of object «, and is the number of objects. 

In the second step, the A/' relations are used to calculate the tightest 
bounds on tj,: 

mm{Q <^t^<^ maxf^jf} 

with 

min{/;;,} = max{/„ + minimum(«)} « = 1, . . ., A/' 
max {/J = min{/;, + maximimi(rt)} n = 1, . . ., A^ 
In the third step, the bounds on the durations of each object n are 
recalculated by using: 

duration(/z) = t^-t„ 

to get 

min{/^} - r„ ^ duration(«) ^ max{/^} -t„ « = 1, . . ., A/' 
In the fourth step, the preferred duration of each object n is 
recalculated: 

if (preferred(/i) < min{/^} - /„) then 

preferred(«) = mm{Q - 
else if (preferred(w) > max{r,} - 1„) then 
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preferred(n) =raax{r^} - 

end if 

In the sixth step, the general error criterion for resolving the duration 
of each multimedia object is defined as: 



N 

E= ^ {duration(M) - preferred(rt)}^ 
rt=i 



or, substituting duration(«) = - 1^: 

N 

^= E {/.-/„-preferTed(/z)}^ 



If we take the derivative ofE with respect to t,, and set this to 0, we see that 
the optimal solution for the absolute time of label Px is: 

10 K + preferred(/i)} 



Finally, in the sixth step, the corresponding duration of multimedia 
object n is calculated with: 

duration(«) -t^-t^ 

BRIEF DESCRIPTION OF THE DRAWINGS 



15 The foregoing and other objects, aspects and advantages will be better 

understood firom the following detailed description of a preferred embodiment 
of the invention with reference to the drawings, in which: 

Figure 1 is a block diagram of one preferred computer system with 
multimedia inputs and outputs that uses the method of the present invention; 
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Figure 2 is a temporal diagram illustrating the problem solved by the 
present invention; 

Figure 3 is a flow diagram showing the logic of the overall process 
according to the invention; 
5 Figure 4 is a flow diagram showing the logic of the process for 

calculating the minimum and maximum times in block 302 of Figure 3; 

Figure 5 is a flow diagram showing the logic of the process for 
calculating in block 303 in Figxire 3; and 

Figure 6 is a flow diagram showing the logic of the process for 
10 calculating the durations of the objects in block 304 of Figure 3. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT OF THE INVENTION 

Referring now to the drawings, and more particularly to Figure 1, there 
is shown in block diagram form a computer system 100 on which the subject 

15 invention may be practiced. The computer system 100 includes a personal 

computer (PC) 105 nmning a windowing operating system and including a 
multimedia audio/video capture adaptor 1 10. A video camera 122 coimects to 
the adaptor 1 10 as does an optional playback monitor 124 for multimedia 
presentations composed on the computer system 100. Other multimedia 

20 hardware 130 may be included as well as various input devices, such a 

keyboard (not shown), a cursor pointing device (e.g., a mouse) (not shown) 
and a microphone 132 or other audio input device, and a monitor 134 on 
which a graphic user interface (GUI) of the operating system and application 
software is displayed. The computer 105 includes secondary memory storage 

25 (e.g., a hard drive) 140 of adequate capacity to store the multimedia 
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presentation being authored. 

The solution to the problem outlined above is best illustrated by a 
simple example. Let us consider a presentation that is authored having three 
multimedia objects, a video clip (V), an audio clip (A), and a background 
image (B). As explained above, the Isis authoring system requires the author 
to specify for each multimedia object the duration range, as well as a relative 
start and end time. For the three objects in our exemplary presentation, the 
parameters are authored as: 





start 


end 


minimum 
duration 


preferred 
duration 


maximum 
duration 


V 


PI 


P2 


3 seconds 


4 seconds 


5 seconds 


A 


P2 


P3 


3 seconds 


4 seconds 


4 seconds 


B 


PI 


P3 


7 seconds 


7 seconds 


8 seconds 



The labels PI, P2, and P3 are to indicate how the various multimedia objects 
are temporarily related. This means, for example, that objects V and B start at 
the same time. The temporal aspect of this authored presentation can be 
depicted more clearly in Figure 2. 

As shown in Figure 2, the background image B starts a point PI and 
ends at a point P3. The duration times are shown in brackets as 7,7,8 
corresponding to 7 seconds minimum duration, 7 seconds preferred duration, 
and 8 seconds maximum duration. Similarly, the video clip V begins at the 
point PI and ends at a point P2, and the audio clip A begins at the point P2 
and ends at the point P3, again with duration times shown in the brackets. 

The player (the client) of the multimedia presentation first receives the 
multimedia object parameters for video clip V and backgroimd B. The player 
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then initializes the time of point PI (arbitrarily) to r,=0, and starts playing the 
two objects V and B with their preferred duration. For the video clip V, this 
means it will be played at the corresponding preferred speed. If no network or 
playback delays occurred, the video will finish after four seconds. However, if 
a delay of I/2 second occurred during playback, the time of point P2 is not /2=4, 
but /2=4.5. The player next attempts to resolve the dxirations of B and A. It 
does this using the relations: 

^1 + 7 ^ /3 ^ /, + 8 
^2 + 3 ^ ^3 ^ ^2 + 4 

Knowing that /|=0 and ti =4.5, we obtain: 

7^/3^8 
7.5 ^ /3 ^ 8.5 

Which is combined into: 

7.5 ^ ^3 ^ 8 

With this we can recalculate the duration range for both the background B and 
audio clip A. Using: 

duration(B) = t^-tx = t^ 
duration(A) = ^3 - " ^3 - 4. 5 

we get 
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7.5 ^ duration(B) ^ 8.0 
3.0 ^ duration(A) ^ 3.5 

We next use these new duration ranges to redefine the preferred durations of 
both audio clip A and background B. For background B, we see that the 
preferred duration cannot be met, and we have to settle for the closest value to 
the preferred value, which is now 7.5 seconds. Similarly, the preferred 
duration for the object audio clip A changes to 3.5 seconds: 

preferred(B) = 7.5 
preferred(A) = 3.5 

Finally, we can use these now feasible preferred durations to determine a good 
value for the time at point P3, and thus for the durations of the objects B and 
A. We do this by defining an error criterion on the durations as the sum of the 
squared deviations fi'om the (updated) preferred durations: 

E = {duration(B) - preferred(B)}2 -h {duration(A) - preferredCA)}^ 

Using the definitions of the durations firom above, and the recalculated 
preferred durations, this is rewritten into: 

E= {ty7,5y + {^3-4.5-3. sy = {/3-7.5}2 + {VS.O}^ 

Minimizing this error with respect to ^3 simply yields: 



/3 = /2(7.5+8.0) = 7.75 



• 
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and the durations are 



duration(B) = 7.75 
duration(A) = 3.25 



From this example, it will be understood that the solution to the 
problem consists of two parts. First, it is defined what information must be 
available to the client in order to be able to detemiine the multimedia object 
durations. And second, the resolution of the durations themselves must be 
solved. 

A client (i.e., player of the multimedia presentation) must receive for 
each multimedia object five items of information. These items are the two 
labels, one for the object's start time and one for the end time, and the three 
durations, the minimimi, maximum, and the preferred duration. In the case of 
video, audio, and other multimedia objects that have a playback speed, the 
preferred diuration must correspond to the "regular" playback speed of the 
object. The information on a particular multimedia object must be delivered to 
the client prior to starting playback of the object. 

When playback has finished for a particular multimedia object, the 
absolute time of a certain label will become known. This means, that one or 
more label times can be resolved using this new information. The time stamp 
resolution is therefore progressive over time, as more information becomes 
available in the form of factual multimedia object durations, and arrival of 
information of objects that are to be played in the (near) fixture. 

To resolve the actual label time, and the corresponding duration of the 
multimedia objects that have that label for their respective end times, the 
following steps are taken: 
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Collect all the dependency relations for the label Px, by taking all 
objects n that have Px as the label for their end time: 

+ niinimum(/2) ^ ^ + maximum(/j) « = 1, . . 
Here t„ is the start time of object n, and is the number of objects. 
Use the relations to calculate the tightest bounds on 
min{/^} <,t^<. niax{/^} 

with 

min{/J = max{/„ + niinimum(/i)} « = 1, . , AT 
max{rj = min{r„ -f- maximum(w)} w = 1, . . ., A^ 
Recalculate the bounds on the durations of each object n, by using: 
duration(«) = tj,-t„ 

to get 

min{/J - 1„ ^ duration(/i) ^ rmx{Q - n = l,...,N 
Recalculate the preferred duration of each object n : 
if (preferred(«) < min{/^} - /„) then 

preferred(«) = min{/^} - 1„ 
else if (preferred(/2) > max{/^} - /„) then 
preferred(w) =max{(,} - 1„ 

end if 

The general error criterion for resolving the duration of each 
multimedia object is defined as: 

N 

E= {duration(/2) -preferred(n)}2 

or, substituting duration(«) = t^' t„: 

^= E {^,-^-preferred(«)}2 
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If we take the derivative of E with respect to and set this to 0, we 
see that the optimal solution for the absolute time t^, of label Px is; 



1 ^ 

^= ttE {^ + prefeiTed(/i)} 



6. The corresponding duration of multimedia object n is calculated with: 



The entire process of steps 1 through 6 is sunmiarized as illustrated in 
Figxire 3. The inputs to the process as in step 1, supra, are shown at block 301. 
Step 2 calculates the minimum and maximum end times over all multimedia 
objects in function block 302. This is described in more detail in the 
description of Figure 4, infra. Next, the steps 3, 4 and 5 are combined in 
function block 303. This is described in more detail in the description of 
Figure 5, infra. Finally, the durations of the objects are calculated in function 
block 304, which is described in more detail in the description of Figure 6, 
infra. 

Step 2 (i.e., block 302 of Figure 3) is illustrated more detail in Figure 
4. The process is initialized in function block 401 before entering the 
processing loop. The value of « is incremented by one in function block 402 at 
the beginning of the processing loop. A test is made in decision block 403 to 
determine if the minimum end time is less than the start time of object n plus 
the minimum duration of object n. If so, the minimum time is set to that value 
in function block 404. If not, a test is made in decision block 405 to determine 
if the maximum end time is greater than the start time of object n plus its 
maximum duration. If so, the maximum time is set to that value in function 
block 406. Finally, a test is made in decision block 407 to determine if all 



duration(/2) = tj,- 



n 
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objects have been processed and, if not, the process loops back to function 
block 402 where the value of m is again incremented, and the maximum and 
minimum times for the next multimedia object are calculated. This processing 
continues xmtil the minimum and maximum end times over all AT multimedia 
objects have been calculated. 

Steps 3, 4 and 5 (i.e., block 303 in Figure 3) are illustrated in more 
detail in Figure 5. The process is initialized in function block 501 before 
entering the processing loop. The value of n is incremented by one in function 
block 502 at the begiiming of the processing loop. A test is made in decision 
block 503 to determine if the preferred duration is greater than the minimum 
end time less the start time of a current object n. If not, the preferred duration 
is set to this value in function block 504; otherwise, a further test is made in 
decision block 505 to determine if the preferred duration is less than the 
maximxmi end time less the start time of the current object n. If not, the 
preferred duration is set to this value in function block 506; otherwise, the 
preferred duration is set to the preferred duration of the object n in function 
block 507. Then, in function block 508, the simi of the times is calculated. A 
test is made in decision block 509 to determine if all objects have been 
processed and, if not, the process loops back to function block 502 where the 
value of n is again incremented. When all objects have been processed, the 
time is computed as the simi divided by N, the number of the multimedia 
objects, in function block 510. 

Step 6 (i.e., block 304 in Figure 3) is shown in more detail in Figure 6. 
The process begins by initializing n to zero in function block 601. The value 
of n is incremented by one in function block 602 at the beginning of the 
processing loop. The duration of each object n is calculated in function block 
603 as the calculated time /j, minus the start time t{n) of the object n. After 
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each calculation, a test is made in decision block 604 to deteniiine if all 
objects have been processed. If not, the process loops back to function block 
602 where n is again incremented and the duration of the next object is 
calculated. The process ends when all //objects have been processed. 
5 While the invention has been described in terms of a single preferred 

embodiment, those skilled in the art will recognize that the invention can be 
practiced with modification within the spirit and scope of the appended 
claims. 
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